fix: attach on macOS framework Python (Homebrew, python.org)#31
Merged
Conversation
On macOS framework Pythons, the running executable is the small Python.app/Contents/MacOS/Python launcher (no Python symbols), while the Python C-API symbols live in the sibling framework dylib at Python.framework/Versions/X.Y/Python. Both attach paths broke on this: - sys.remote_exec path (CPython >= 3.14): client.py:get_base_addr passed sys.executable (resolves to bin/pythonX.Y) to py_bin_base_addr_locate.sh, which compared it against the lsof-derived launcher path on the server side. The strict string compare aborted with "not in the same python environment" even when client and target shared the same interpreter. - lldb path (CPython <= 3.13): resolve_bin_path.sh correctly returns the launcher (lldb needs it for `process attach -p` to succeed), but resolve_symbol.sh ran nm against the launcher and found no take_gil symbol, so attach aborted with "test find take_gil function failed". Fix: - client.py: pass get_py_bin_path(os.getpid()) to py_bin_base_addr_locate.sh so the client side uses the same lsof resolution as the server side. - resolve_symbol.sh: when symbol_bin_path points at the framework launcher, redirect nm to the sibling framework dylib. resolve_bin_path.sh stays unchanged so lldb still attaches to the real running executable. Verified on macOS 15.1.1 arm64: - Python 3.14.2 arm64 (sys.remote_exec, with sudo for task_for_pid): attach OK, getglobal __main__ i returns live counter. - Python 3.10.6 x86_64 (lldb path, no sudo): attach OK, getglobal __main__ i returns live counter.
bppps
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background / 背景
flight_profiler <pid>在 macOS 上 attach 失败:test find take_gil function failedflight_profiler and target process are not in the same python environment!复现稳定。
Repro environment
macOS 15.1.1, Darwin 24.1.0, host arch arm64
两套常见 Python 安装:
/usr/local/Cellar/python@3.10/...)/opt/homebrew/Cellar/python@3.14/...)目标进程 = 简单 sleep+print 循环:
复现命令:
flight_profiler <pid> --debug --cmd "help"Root cause / 原因分析
关键事实:在 macOS framework Python 里,「一个 Python 环境」对应的不是单个文件
一个 Homebrew / python.org 安装的 Python,磁盘上至少同时摆着:
…/Python.framework/Versions/X.Y/Resources/Python.app/Contents/MacOS/Pythonlsof -p看到的也是它);不含 Python C API 符号…/Python.framework/Versions/X.Y/Pythontake_gil等 Python C 符号都在这里…/bin/pythonX.Ybin/pythonbin/pythonX.Y这四个路径同属一个 Python 环境,但是它们是不同文件。后续两个 bug 都是因为旧代码没把这一点处理对。
Bug 1 — sys.remote_exec 路径:同环境检查里两端用了不同的解析方式
被涉及的脚本是
flight_profiler/shell/mac/py_bin_base_addr_locate.sh,里面会比较 client 和 server 的 Python binary 路径,不相等就 abort。但旧代码两端取这个路径的方式是不一致的:lsof -p <server_pid>找txt→ 进程实际跑的 binary…/Python.app/Contents/MacOS/Python(launcher)sys.executable→realpath→ 跟 symlink 跳到 bin 那层…/bin/python3.14lsof -p $$同样取txt…/Python.app/Contents/MacOS/Python✅ 与 server 一致旧实现用严格字符串相等比这两个值。即使 client 和 server 是同一个解释器启动的,两边解析路径不同,比较的字符串本身就不等:
→ 即使是同环境也被判 "not in the same python environment" → attach 中止。
自相矛盾的证据:
client.py:show_pre_attach_info()里那行打印Verify pyFlightProfiler and target are using the same python executable: 🌟的诊断,本身就用get_py_bin_path()(即 lsof)取双方路径。也就是说之前的代码:新实现把
client.py:get_base_addr里传给 shell 的 client 路径从str(sys.executable)改成get_py_bin_path(os.getpid()),让 attach 比较和已有的诊断使用同一种路径解析方式。规则不变(仍然要求两端 binary 相等),只是判定方法变得一致、可信。验证安全性没被削弱:用 3.14 arm64 venv 的 flight_profiler 去 attach 一个 3.10 x86_64 目标进程(明确跨环境),新代码下仍然命中:
跨环境检查仍然按预期工作。
Bug 2 — lldb 路径:nm 查符号时跑错了文件
跟同环境检查无关,纯粹是符号查找路径问题。
flight_profiler/shell/resolve_symbol.sh用nm查take_gil的偏移。旧代码让nm跑在resolve_bin_path.sh返回的 binary 上,而那是 launcher:所以 launcher 上跑 nm 永远查不到 →
resolve_symbol.sh报invalid python process $pid, test find take_gil function failed→ attach 中止。为什么不能直接让
resolve_bin_path.sh也返回 framework dylib? 因为 lldb 那边需要 launcher:如果
$py_bin_path是 framework dylib(不是真实 exec 的 binary),lldb 会报error: no error returned from Target::Attach, and target has no process。这是我们已经实测过的、之前社区里有人为了修 Bug 2 改resolve_bin_path.sh把 launcher 重定向到 dylib 之后引入的另一个 regression。正确做法:launcher 用于 attach、framework dylib 用于符号查找,职责分离。所以本 PR:
resolve_bin_path.sh不变(继续返回真实跑的 binary,即 launcher)resolve_symbol.sh里加一个 macOS framework launcher → 同框架 dylib 的回退分支,仅 nm 查询走这条回退这两个文件本来就同属一个 Python 环境,回退不会引入跨环境风险。
Fix / 解决方案
只动两个文件,22 行新增:
flight_profiler/client.py—get_base_addr改为传get_py_bin_path(os.getpid()),让 client 与 server 用同一份 lsof 解析路径,不再用sys.executable。flight_profiler/shell/resolve_symbol.sh— 当symbol_bin_path命中…/Python.app/Contents/MacOS/Pythonpattern 时,把nm重定向到同 framework 下的…/Versions/X.Y/Pythondylib。resolve_bin_path.sh不动,lldb 仍然拿到正确的 launcher 用于process attach -p。设计原则:launcher 与 framework dylib 是 macOS framework Python 上两件不同的事(attach 用 launcher / 符号查询用 dylib),不要把它们合到同一个解析函数里。
Verification / 验证
getglobal __main__ i返回不断增长的活计数器(PEP 768 在 macOS 仍需sudo满足task_for_pid,与本 PR 无关)getglobal __main__ i同上复现验证步骤:
修复前
--debug输出会停在flight_profiler and target process are not in the same python environment!或test find take_gil function failed;修复后 lldb / sys.remote_exec 都能完成注入并响应命令。Out of scope / 显式不处理
flight_profiler/lib/flight_profiler_agent.dylib是否随 wheel 分发是另一个问题(需要先make build产出 dylib),与本 PR 无关,可在后续 PR 处理。sudo要求源自 PEP 768 /task_for_pid系统约束,不是 bug。